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1 Introduction 


I think the best introduction to assembly programming is [Patterson, Hennesy] chapters 3 
and 4. I assume you have read those chapters and know how to program in MIPS 
assembly, and are looking for a short guide on how to program in IA-32 assembly. 
Unfortunately all the books and tutorials I have read are: 


1. Too long (as a student I didn’t have time to read 1000 page books or tutorials). 

2. Assuming that the reader is programming in MS-DOS. 

3. Assuming that the reader needs to know about unimportant topics as BCD 
arithmetic. 


Therefore I wrote this short paper that hopefully teaches you all the basic things you need 
to know to start programming in IA-32 assembly. The topics covered are: 


e Most important aspects of the IA-32 architecture (registers, addressing modes, 
stack). 

e MASM assembler directives (i.e. how to ure MASM to write IA-32 assembly 
programs). 

e¢ How to use assembly code in your Visual C++ programs. 

e How to read assembly listings produced by the Microsoft C compiler. 


2 IA-32 assembly programming 


This chapter is intended to be a reference you can use when programming in IA-32 
assembly. It covers the most important aspects of the IA-32 architecture. 


2.1 Assembly Language Statements 

All assembly instructions, assembler directives and macros use the following format: 
[label] mnemonic [operands] [; comment] 

Fields in square brackets are optional. 

Label: used to represent either an identifier or a constant. 


Mnemonic: Identifies the purpose of the statement. A Mnemonic is not required if a line 
contains only a label or a comment. 


Operands: Specifies the data to be manipulated. 
Comment: Text ignored by the assembler. 
Example 


; This is a comment 


jmp labell ; This is also a comment 
add eax, ebx 

labell: 
sub edx, 32 


Labels are in italic, mnemonics in bold, operands are underlined, and comments are in 
regular text. 


Most instructions take two operands. Usually one of the operands is in a register, and the 
other can be in a register, memory or be an immediate value. In many instructions the 


first operand is used as source and destination. 


Example: 








add eax, ebx ; EAX = EAX + EBX 











2.1 Modes 


Normally we only run in protected mode. But the Pentium processor can also run in real 
mode (for backward compatibility), system management mode (power management) and 
virtual 8086 mode (for backward compatibility). 


2.2 Registers 
This chapter is a summary of chapters 2, 3 and 5 from [Dandamudi]. Most of the figures 


and examples are taken from this book. If you want a more detailed explanation (or a 
better written one) you should buy and read this book. 


2.2.3 Data Registers 
The IA-32 processors provides four 32-bits data registers, they can be used as: 
e Four 32-bits registers (EAX, EBX, ECX, EDX) 


e Four 16-bits registers (AX, BX, CX, DX) 
e Eight 8-bits registers (AL, AH, BL, BH, CL, CH, DL, DH) 


32-bits registers (31...0) Bits 31...16 Bits 15...8 [Bits 7...0 

EAX AH AL 

EBX BH BL 

ECX CH CL 

EDX | >); ||») 








The data registers can be used in most arithmetic and logical instructions. But when 
executing some instructions, some registers have special purposes. 


2.2.4 Pointer and Index Registers 


The IA-32 processors have four 32-bits index and pointer registers (ESI, EDI, ESP and 
EBP). These registers can also be used as four 16-bits registers (SI, DI, SP and EP). 


Usually ESI and EDI are used as regular data registers. But when using the string 
instructions they have special functions. 


ESP is the stack pointer, and EBP is the frame pointer. If you don't use stack frames, you 
can use EBP as a regular data register. 


32-bits registers (31...0) Bits 31...16 [Bits 15...0 Special function 


est_____ | __fst____JBouee index 
a ae a ae 
EBPs SBP Frame pointer 





2.2.5 Control Registers 


The two most important control registers are the instruction pointer (EIP) and the EFlags 
register. 


The Pentium has also many other control registers, which are not covered in this 
document (they control the operation of the processor, and applications cannot change 
them). 

The Instruction Pointer Register (EIP) 

EIP points to the next instruction to be executed. EIP cannot be accessed directly. 
The EF lags register 

Six of the flags in the EFlags register are status or arithmetic flags. They are used to 


record information about the most recently executed arithmetic or logical instruction. 
Three of the flags: SF, PF and AF are rarely used. 





e Zero Flag (ZF). This flag is set when the result of the last executed arithmetic 
instruction was zero. ZF is used to test for equality or count down to a preset 
value. Related instructions are: jz and jnz. 

e Carry Flag (CF). CF is set if the last arithmetic operation (on two unsigned 
integers) was either too big or too small (out of range). CF is used to propagate 
carry or borrow, detect overflow/ underflow or test a bit (using shift/ rotate). 
Related instructions are: jc, jnc, stc, clc, and cmc. Note that inc and dec does not 
affect the carry flag. 

e¢ Overflow Flag (OF). OF indicates when an operation on signed integers resulted 
in an overflow/underflow. Related instructions are: jo and jno. 

e Sign Flag (SF). Indicates the sign of the result of an arithmetic operation. Related 
instructions are: js and jns. 

e Parity Flag (PF). Indicates the parity of the 8-bit result produced by an operation. 
PF = 1 if the byte contains an even number | bits. It is used in data encoding 
programs. Related instructions are jp and jnp. 


e Auxiliary Flag (AF). Indicates whether an operation has produced a result that has 
generated a carry, or borrow into the low-order four bits of 8- 16- or 32-bit 
operands. AF is used in arithmetic operations on BCD numbers. 


One of the flags is a control flag: 


e Direction flag (DF). It determines wetter string operations should scan the string 
forward or backward. It is only used in string instructions. DF can be set by std 
and cleared by cld. 


The remaining ten flags are system flags. They are used to control the operation of the 
processor. Ordinary application programs cannot set these flags directly. 


e TF (trap flag) 

e IF (interrupt flag) 

e IOPL (//O privilege level) 

e NT (nested task) 

e RF (resume flag) 

e VM (virtual 8086 mode) 

e AC (alignment check) 

e VIF (virtual interrupt flag) 

e VIP (virtual interrupt pending) 
e ID(ID flag) 



























































Examples 
mov EAX, 8 ; ZF = 0 
sub EAX, 8 ; Z2F=1 
cmp char, 0 ; ZF = 1 if char == '\0' 
cmp BAX, EBX ; ZF = 1 if EAX = EBX 
P for. (a <= 07 aoe ep aa +) 
mov ECX, 12 ; ECX 12 
loop: 
<do something> 
dec ECX ; ECX = ECX - 1 
jnz loop ; Jump if ZF = 0 
mov AL, 100 
add AL, 200 ; CF = 
mov AX, 100 
sub AX, 101 ; CF = 1 (any negative integer is out of range) 
mov AL, 100 
add AL, 30 ; OF = 1 (signed char range is -128...127) 


Note that the processor does not know if you are using signed or unsigned integers. OF 
and CF are set for every arithmetic operation. 


mov AL, 15 
add AL, 100 ; SF = 0 (positive result) 
mov AL, 15 
sub AL, 100 ; SF = 1 (negative result) 








2.2.6 Segment registers 
The Pentium processor has six 16-bits segment registers: 


CS (code segment) 
DS (data segment) 
SS (stack segment) 
ES (extra data segment) 
FS (extra data segment) 
GS (extra data segment) 


Modern applications and operating systems (including Windows 2000 and Linux) use the 
flat memory model (unsegmented memory model). In this model all segment registers are 
loaded with the same segment selector. So all memory references are to a single linear- 
address space. 


2.3 Addressing 


Most of the figures and examples are taken from [Dandamudi] chapter 5. 


2.3.1 Bit and Byte Order 


The Pentium processors uses little-endian byte order 


2.3.2 Data Types 


Data Type [Size 
Byte 8 bits 
Word 16 bits 
Doubleword |32 bits 


Quadword 64 bits 











2.3.3 Register Addressing Mode 


The operand is in a register. 





mov FAX, EBX ; move EBX to EAX 

















2.3.4 Immediate Addressing Mode 


The operand is part of the instruction. 


mov EAX, 132 ; move 132 to EAX 








2.3.5 Memory Addressing Modes 
Direct addressing mode 


The operand is in memory, and the address is specified as an offset. 





a_letter DB 'c' ; Allocate one byte of memory, initialize it to 'c'. 
mov AL, a_letter ; Move data at memory location "a_letter" into AL. 
; I.e. move 'c' to AL. 


Register Indirect Addressing 


The operand is found at the memory location specified by the register. The register is 
enclosed in square bracket. 











mov EAX, ESP ; Move stack pointer to EAX 
mov EBX, [ESP] ; Move value at top-of-stack to EBX 




















The first move uses register addressing, and the second uses register indirect addressing. 
Indirect Addressing Mode 


The offset of the data is in one of the eight general-purpose registers. 








.DATA 

array DD 20 DUP (0) ; Array of 20 integers initialized to zero 
. CODE 

mov ECX, OFFSET array ; Move starting address of 'array' to ECX 











The assembler directive OFFSET is used when we want to use the address of an element, 
and not the contents of the element. 


Note that: 





mov ECX, array 


moves the first element in array (array[0]) into ECX, and not the address of the first 
element (&(array[0])). 


Based Addressing 


One of the eight general-purpose registers acts like a base register in computing the 
effective address of an operand. The address is computed by adding a signed (8-bit or 32- 
bit) number to the base address. 


mov ECX, 20[EBP] ; ECX = memory[EBP + 20] 














Indexed Addressing 

The effective address is computed by: 

(Index * scale factor) + signed displacement. 

The beginning of the array is given by a displacement, and the value of the index register 
(EAX, EBX, ECX, EDX, ESI, EDI, EBP) selects an element within the array. The scale 


factor is used to specify how large the elements in the array are (in bytes). The scale 
factor can only be 1, 2, 4 or 8. 














add AX, [DI + 20] ; AX = AX + memory[DI + 20] 
mov AX,table[ESI*4] ; AX = memory[ OFFSET table + ESI * 4 ] 
add AX, table[SI] ; AX = AX + memory[ OFFSET table + ESI * 1] 








Based-Indexed Addressing 
In this addressing mode, the effective address is computed as: 
Base + (Index * Scale factor) + signed displacement. 


The beginning of the array is given by a base register (EAX, EBX, ECX, EDX, ESI, EDI, 
EBP, ESP) and a displacement, and the value of the index register (EAX, EBX, ECX, 
EDX, ESI, EDI, EBP) selects an element within the array. The scale factor is used to 
specify how large the elements in the array are (in bytes). The scale factor can only be 1, 
2, 4 or 8. The signed displacement must be either an 8, 16 or 32-bit value. 






































mov EAX, [EBX+tEST] ; AX = memory[EBX + (ESI * 1) + 0] 
mov EAX, [EBX+EPI*4+2] ; AX = memory[EBX + (EPP * 4) + 2] 
The PTR directive 


Sometimes the assembler does not know how large values it is supposed to use, as shown 
in the following example: 


array SQWORD 20 DUP (0) 
mov ECX, OFFSET array 
mov [ECX], 25 


int array[20]; 

ECX = &(array[0]) 

emory[ECX] = 25, but is '25' a 1-byte, 
2-byte or 4-byte value? 




















Ne Ne Ne Ne 





To clarify we use the PTR directive (syntax: type-specifier PTR) 











mov ECX, OFFSET array ; ECX = &(array[0]) 
mov [ECX], SQWORD PTR 25 7; memory[ECX] = 25, and '25' is a 4-byte 
; value (signed quad word) 











You should use the PTR directive when the operand size is not implicit with the register 
name. 


2.4 Stack 


Properties: 
e Pointed to by SS:ESP 
e Only 32-bit data are pushed on the stack. (push al, uses 32-bits on the stack) 
e The stack grows downward. 
e ESP points to the last word saved on the stack. 


Stack operation 
push source: 


1. ESP=ESP-4 
2. memory[ESP] = source 


pop destination 


1. destination = memory[ESP] 
2. ESP=ESP +4 


Other stack operations are: pushfl (push EFlags), popfl (pop EFlags), pusha (push all 
general-purpose registers), popa (pop all general-purpose registers) 


2.5 C procedure call convention 


The convention below is used by MASM, I don’t know if gas (“Linux” assembler) uses 
the same convention. 


When doing a function call, the caller must: 
e Save EAX, EBX, ECX and EDX if they must be preserved. 
e Push all arguments on the stack. The arguments are pushed from right to left. 


e Invoke the function, by using the instruction call (call will push the return address 
and jump to the called function) 


10 


Before the called function starts running it must: 


e Save EBP, ESI, EDI, DS and SS if they are clobbered. 

e Create a stack frame (if stack frames are used). This is done by setting: 
1. EBP=ESP 
2. ESP=ESP - frame size 
o The stack frame must contain space for local variables. 

e Save the direction flag (EFlags.DP), if it is altered. 


Before the called function returns it must: 


Restore all saved registers and the direction flag (if it was saved) 

Pop the stack frame by setting ESP = EBP 

A return value is stored according to the table below. 

Return to the caller by using the ret instruction (ret pops the return address, and 
jumps to it) 


After returning from a function call, the caller must 


e Pop all arguments. (Normally ESP is set to ESP + sizeof(arguments)) 
e Restore all saved registers. 


Return Value : ; 
Data Type Is Saved in Register 





Ichar IAL 

short (16-bit) AX 

int (32-bit) EAX 
64-bit ECX:EAX 
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3 MASM Assembler directives 


This chapter lists and explains the most important MASM directives. 


The figures are from [MASM] and [Dandamudi]. Most of the examples are also taken 
from this book (this chapter is really a summary of chapter 3 from [Dandamudi]). 


3.1 Data allocation 

The general format of a storage allocator is: 

[variable-name] define-directive initial-value [,initial-value].... 
Variable-name: identify the storage space allocated. 


Define-directive: the following table shows the directives that can be used, and the size 
in bytes: 


The following directives indicate the size and value range of some integers and floating 
point numbers: 


Directive Description of Initializers 

BYTE, DB (byte) Allocates unsigned numbers from 0 to 255. 

SBYTE (signed byte) Allocates signed numbers from —128 to +127. 
WORD, DW (word =2 bytes) Allocates unsigned numbers from 0 to 65,535 (64K). 
SWORD (signed word) Allocates signed numbers from —32,768 to +32,767. 


WORD, DD (doubleword = 4 Allocates unsigned numbers from 0 to 4,294,967,295 (4 

ytes), megabytes). 

DWORD (signed doubleword) Allocates signed numbers from —2,147,483,648 to 
+2,147,483,647. 


WORD, DF (farword = 6 bytes) |Allocates 6-byte (48-bit) integers. These values are 
normally used only as pointer variables on the 
80386/486 processors. 
QWORD, DQ (quadword = 8 Allocates 8-byte integers used with 8087-family 
ytes) coprocessor instructions. 





BYTE, DT (10 bytes), Allocates 10-byte (80-bit) integers if the initializer has a 
adix specifying the base of the number. 
REAL4 Short (32-bit) real numbers 


REAL8 Long (64-bit) real numbers 
REAL10 10-byte (80-bit) real numbers and BCD numbers 
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Examples 











lettere DB "c"' ; Allocate a single byte of memory, and 
; initialize it to the letter 'c'. 
an_integer DD 12425 ; Allocate memory for an integer (4-bytes), and 
; initialize it to 12425. 
a_float REAL4 2.32 ; Allocate memory for a float, and initialize 
ALG 0) 2432 
essage DB 'Hello',13,0 ; Allocate memory for a null terminated string 
; "Hello\n" 
arks DW 0, O, O, O ; Both allocates memory for an array of 4 * 2 
; bytes, and initialize all elements to zero. 
arks DW 4 DUP (0) ; DUP allows multiple initializations to the 
7 same value 
name DB 30 DUP(?) ; Allocate memory for 30 bytes, uninitialized. 
atrix QW 12%*10 ; Allocate memory for a 12*10 quad-bytes matrix 











We can also use the LABEL directive to name a memory location, the syntax is: 


name LABEL type 


3.2 Defining Constants 


3.2.1 The EQU directive 
Syntax: name EQU expression. It serves the same purpose as #define in C. 
3.2.2 The = directive 


Syntax: name = expression. The symbol that is defined by the = directive can be 
redefined, but it cannot be used to define strings. 


3.3 Multiple Source Program Modules 


3.3.1 The PUBLIC Directive 
Syntax: PUBLIC label], label2, label3... 


This directive makes the labels public, and therefore available from other modules 
(source files). 


Examples 


PUBLIC error_msg, table 
PUBLIC _a_C_function ; All C functions begin with an underscore 
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3.3.2 The EXTRN directive 
Syntax: EXTRN label:type 


This directive can be used to declare extern labels (variables, functions, etc). The table 
below lists some types: 


BYTE Data variable (8-bits) 
WORD Data variable (16-bits) 
DWORD Data variable (32-bits) 
QWORD Data variable (64-bits) 
PROC A procedure name 





Examples 








EXTRN error_msg:BYTE, table:DW 
EXTRN _printf£:PROC ; All C functions begin with an 
; underscore. 











Normally source files are included when compiling, and object files (libraries) when 
linking. 
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4 Mixed language programming 


This chapter covers three topics: how to write inline assembly in Visual C++, how you 
can use Visual C++ to debug your assembly programs, and how to read assembly listings 
(produced by the compiler). 


4.1 Inline assembly 


Inline assembly is used to insert assembly code into C source files. 


In Visual C++ the keyword asm is placed before the inline assembly code, as shown in 
the examples. 


Examples 





asm pushf ; Push the Eflags register 


asm { 
mov EAX, 0O 
sub EAX, 12 





} 


4.2 Using the Visual C++ debugger to test simple 
assembly programs 


If you want to see what simple assembly programs do with the data registers and memory 
you can use the debugger in Visual C++. 


To do this you need to: 


1. Create a new (console) project in Visual C++. 

2. Write your assembly code in inline assembly, as shown below. 

3. Insert a breakpoint at the beginning of the assembly code (right click | Insert 
Breakpoint) 

4. Start debugging (Build | Start Debug | Go (F5)). 

5. View register window or/and memory window (View | Debug Windows | 
Registers or Memory). 

6. Step trough the program (Debug | Step into (F11)). Then you can see what the 
registers and memory contain after each executed instruction. 

7. When you are done, you stop the debugger (Debug | Stop Debugging) 


15 


We can see in the figure that: 


e The breakpoint is set to the beginning of the assembly code (red bullet). 

e The next instruction to be executed is add eax, ebx (yellow arrow) 

e That the last instruction executed changed registers EBX, and EIP (red color in 
Registers window) 





debugger - Microsoft ¥isual C++ [break] - [E:\...\debugger\debugger.c] 
[El Bile Edit View Insert Project Debug Tools Window Help 
alsa) s BElESS 2 


Ont ‘nalatyndd> 









—asm { 
& mov eax, 0 > eax = 0 
mov ebx, ? : ebx = ? 
a] add eax, ebx ) Bax = eax + ebx 
} 
return 0: 


goooo000 EBX = 0 
oo000000 EDX 
oo000000 EDI 


ooo00007 
00341518 
OO012FFS80 
O040D3E2 ESP = 0012FF34 
O012FF80 EFL = 00000202 
aoggoddoo0000000 





|| Pie 
4 ; A coietfrd 





Note: By using this method your C programs will probably not function correctly, unless 


you save and restore all registers that are clobbered in the assembly code. 


4.3 Using assembly files in Visual C++ 


This chapter tells you how to use assembly files in a Visual C++ project. 


First we write a C source file, which calls the assembly function: 


#include <stdio.h> 


/* Return a+b 
* This function is in func.asm 
* / 
extern int assembly_function(int a, int b); 
int main(void) 
{ 
printf("14 + 21 = %d\n", assembly_function(14, 21)); 


return 0; 


Then we write an assembly file that contains the function we are interested in: 


.586 ; 32-bits (with Pentium instructions) 
-MODEL flat ; Flat memory model (no-segmentation) 














EXTERN _printf:NEAR ; printf is an external function 





; assembly_function is a public function 
; Note that all C functions begins with an underscore 
PUBLIC _assmebly_function 





.DATA ; Begin data segment 


; printf() string (null terminated) 
printf_msg DB 'Arguments: %d and %d\n', 0 


. CODE ; Begin code segment 
; assembly function in C: 
; int assembly_function(int a, int b) 
; { 

F 


. int c =a +b; 








; printf ("Arguments: %d and %d\n", a, b); 


: return: C7 





; The = directive does the same as #define in C 


; Location of arguments on the stack frame 














_argl = 8 ; EBP + 8 = argl (a) 
_arg2 = 12 ; EBP + 12 = arg2 (b) 
; Location of local variables 
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_locl = -4 ; EBP - 4 = loc 





_assembly_function: 

push ebp ; Save old base 
ov ebp, esp ; Point EBP to 
sub esp, 4 ; Make space on 





ov eax, SDWORD PTR _argl[ebp] ; 


al variable (c) 


pointer 
top of stack 
stack for local variable 


Move argument 1 into eax 
interested 





; Note that we specify that we ar 
; in moving 4 bytes (SDWORD PTR) 
ov ebx, SDWORD PTR _arg2[ebp] ; 
Ov eCxX, eax ; ECX = eax 

add ecx, ebx ; cx = eax + ebx 

















; Save caller saved registers 

; Note that we don't need to save 
; need to be preserved 

push ecx 

; Push printf arguments 

push ebx ; Push argument 3 

push eax ; Push argument 2 





push OFFSET printf_msg 7 
call _printf 
add esp, 12 ; 





; Restore caller saved registers 
pop ecx 


mov eax, ECxX : 
mov esp, ebp ; 
pop ebp i 
ret , 





; End of source file 
end 





Move argument 2 into ebx 


eax and ebx, because they don't 





Push address of string (argument 0) 


Pop arguments 


Store return value in eax 


ESP points to top of stack frame 
Restore EBP register 
Return to caller 








Then we need to assemble this file when building the project: 


Insert file into project 


Select the Custom Build tab. 
In commands you write: 


rma te 


Right-click on the filename in the File View, and choose Settings. 


c:\masm61 1\bin\ml /c /coff /Zd $UinputName).asm 


5. And in Outputs you write: 
$(InputName).obj 
6. Compile and run as usual. 
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4.4 Understanding Assembly Listings 


One way to learn assembly programming is to study assembly listings produced by the 
compiler. In this chapter I have commented the assembly listing produced by the 
Microsoft compiler for the C program given in the next page. 


C code: 


#include <stdio.h> 
#include "error_wrapper.h" 





/* Just open the file given as the first command line 
* argument. 
*7/ 

int main(int argc, char *argv[]) 

{ 
FILE *f£; 





/* First argument is the name of the executable file */ 
setprogname (argv[0]); 





/* Second argument is the file to be opened */ 
if (argc < 2) 
eprintf("Usage: error_wrapper filename"); 


f = fopen(argv[1], "r"); 
if (f == NULL) 

eprintf("can't open file: %s", argv[1]); 
fclose(f); 


printf ("File opened and closed without errors\n"); 


return 0; 


Assembly output (my comments begins with three semicolons): 


777, Name of the c file ? 
TITLE H:\d241_a00\assembly_example\error_wrapper_test.c 











77; 386 processor mode (P: enable the instructions available only at 
higher privilege levels) 
.386P 


77; This file contains assembler macros and is included by the files 
777 Created with the -FA compiler switch to be assembled by MASM. 
include listing.inc 








77; 1£ MAMSM version > 5.1 then use flat memory model 
777 (no segmentation, code and data in the same segment) 
;7;; We use FLAT in Windows 2000 

if @Version gt 510 

.-model FLAT 
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RE 


Ignore this 












































































































































































































































































































































~string' 


“string' 


’ 


, 


~string' 
~string' 


else 
_TEX SEGMENT PARA USE32 PUBLIC 'CODE' 
_TEX ENDS 
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA' 
_DATA ENDS 
CONS SEGMENT DWORD USE32 PUBLIC 'CONST' 
CONS ENDS 
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS' 
_BSS ENDS 
SSSYMBOLS SEGMENT BYTE USE32 'DEBSYM' 
SSSYMBOLS ENDS 
SSTYPES SEGMENT BYTE USE32 'DEBTYP' 
SSTYPES ENDS 
_TLS SEGMEN DWORD USE32 PUBLIC 'TLS' 
_TLS ENDS 
; COMDAT ??_C@_OBO@HHKP@Usage?3?5error_wrapper?5filename?SAA@ 
CONS SEGMEN DWORD USE32 PUBLIC 'CONST' 
CONS ENDS 
; COMDAT ??_C@_0O1LHO@r?SAAG@ 
CONS SEGMEN DWORD USE32 PUBLIC 'CONST' 
CONS ENDS 
COMDAT ??_C@_OBE@OCM@can?8t?50pen?5file?3?5?SCFs?SAAG@ 
CONS SEGMEN DWORD USE32 PUBLIC 'CONST' 
CONS ENDS 
; COMDAT ??_C@_OCH@BNAK@File?50pened?5and?5closed?5without ?5e@ 
CONS SEGMEN DWORD USE32 PUBLIC 'CONST' 
CONS ENDS 
; COMDAT _main 
_TEX SEGMENT PARA USE32 PUBLIC 'CODE' 
_TEX ENDS 
FLAT GROUP _DATA, CONST, _BSS 
ASSUME CS: FLAT, DS: FLAT, SS: FLAT 
endif 
77, Main is a public function (other modules can call it) 
PUBLIC _main 
77; These are static string labels. 
777 Static strings must be public so that other modules can use them. 
777 (the module which printf() is in must access string ABC when we 
7770uSe: printf ("ABC"); 
PUBLIC ??_C@_OBO@HHKP@Usage?3?5error_wrapper?5filename?SAA@ ; 
PUBLIC ??_C@_O1LHOG@r?SAAG@ ; 
PUBLIC ??_C@_OBE@OCM@can?8t ?50pen?5file?3?5?SCFs?SAAG@ 
PUBLIC ??_C@_OCH@BNAK@File?50pened?5and?5closed?5without ?5e@ 
;7;7 These are external functions 
EXTRN _fclose:NEAR 
EXTRN _fopen:NEAR 
EXTRN _printf:NEAR 
EXTRN _eprintf:NEAR 
EXTRN _setprogname : NEAR 


x 


- 


r 


’ 


’ 


’ 


I think this one is a debugging function used to check that the 
stack frame is restored correctly. 
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EXTRN 





Pag bate 


vre e 





__chkesp:NEAR 


COMDAT ??_C@_OBO@HHKP@Usage?3?5error_wrapper?5filename?SAA@ 
e H:\d241_a00\assembly_example\error_wrapper_test.c 





CONST is used to define constant data that must be stored in 





































































































77% Memory. 
77; SEGMENT We define data in segments 
j77.062?_C... I think is a label 
777 DB: byte aligned 
77; 'Usage...DB...filename, OOH: Data (a null terminated string) 
777 ‘'String': class, used to organize segments 
77, ENDS: end of this segment 
CONS SEGMENT 
??_C@_OBO@HHKP@Usage?3?5error_wrapper?5filename?SAA@ DB 'Usage: 
error_wra' 
DB "pper filename', OOH ; \string' 
CONS ENDS 
777 More static string definitions 
° COMDAT ??_C@_0O1LHO@r?SAA@ 
CONS SEGMENT 
2??_C@ O1LHO@r?SAA@ DB 'r', OOH ; \string' 
CONS ENDS 
* COMDAT ??_C@_OBE@OCM@can?8t?50pen?5file?3?5?SCFs?SAAG@ 
CONS SEGMENT 
2??_C@_OBE@OCM@can?8t?50pen?5file?3?5?SCFs?SAA@ DB 'can''t open file: 
ss', OOH ; ~string' 
CONS ENDS 
; COMDAT ??_C@_OCH@BNAK@File?50pened?5and?5closed?5without ?5e@ 
CONS SEGMENT 
2??2?_C@_OCH@BNAK@File?50pened?5and?5closed?5without?5e@ DB 'File opened 
and' 
DB " closed without errors', QOaH, OOH ; \string' 
CONST ENDS 
; COMDAT _main 
777 Start of text (code) segment 





TEX 








SEGMENT 











777 To access argument argc in the stack frame we can add _argcS to the 


vire a 


777, At memory location [EBP] + 4 is the return address of the function 





ddress EBP points to. 








j7;7 that called this function 

77; At memory location [EBP] is the old stack frame pointer. 

_argc$ = 8 

_argvS = 12 

77, If we had a third argument it would be at memory location [EBP] 


tig 1 


6 


j7;7 To access local variable f in the stack frame we can add _f$ to the 


try a 
_f$ = 
tie I 
tre 
ae 





ddress EBP points to. 
—4 


f we had more local variables they would be at memory location: 





EBP] — 8, [EBP] - 12... (Note that even char's use 4 bytes, we 
annot push one byte on the stack) 





21 





77, Main is a public procedure, and the code starts here. 
_main PROC NEAR ; COMDAT 
ry. (C.‘Code 
8 { 


, 


’ 


10 
11 
12 


13 
14 
ln 


’ 





Epilogue code 


Save the old stack frame pointer 





push ebp 
Establish a new stack frame 
mov ebp, esp 


Create room for 


local variables. I don't know why it subtracts 68 





bytes when ther 


is only one local variable. Performance? 





sub esp, 68 ; 00000044H 
Save call saved registers used in this function. 

push ebx 

push esi 

push edi 


I think the following "clears" the stack area reserved for 


local variables. 


code 








Compute th ffective address of the old stack frame pointer and 
store it in EDI. 

lea edi, DWORD PTR [ebp-68] 
We want to repeat the strings instruction (stosd) 17 times. 

mov ecx, 17 ; 00000011H 
I have no idea why the value ccccccccH is used. 

mov eax, —858993460 +5 CCeéCcececH 
for (i = 0; i < 17; itt) 

Store EAX at address (EDI + 4 * i) 








rep stosd 


FILE 





za es 


/* First argument is the name of the executable file */ 
setprogname (argv[0]); 


Move a 
mov 
Move argv 
mov 
Push the 
push 


rgument argv into eax 
eax, DWORD PTR _argv$S[ebp] 
0] into ecx 
ecx, DWORD PTR 
first (and only) 
ecx 
-and call the function 
call _setprogname 
Pop the argument. 
add esp, 4 





[eax] 
argument... 











/* Second argument is the file to b 


if (argc < 2) 


opened */ 


In IA-32 on of the operands can be in memory 
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cmp DWORD PTR _argcS[ebp], 2 
;77 If lst operand >= 2nd operand then goto label $1L363 
jge SHORT $L363 


; 16 : eprintf("Usage: error_wrapper filename"); 


j7;7 Push static string as lst argument 

push OFFSET 
FLAT: ??_C@_OBO@HHKP @Usage?3?5error_wrapper?5filename?SAA@ ; *string' 
77, And call eprintf 





call _eprintf 
777 Pop argument 
add esp, 4 


77; Label to jump to, if argc >= 2 








SL363: 

pamee lay) 

; 18 g f = fopen(argv[1], "r"); 

777 Push second argument, the static string "r" 
push OFFSET FLAT: ??_C@_01LHO@r?SAAG@ ; \string' 
mov edx, DWORD PTR _argv$[ebp] 

777 Move argv[1l] into eax... 
mov eax, DWORD PTR [edx+4] 

777 .--and push it as the second argument... 
push eax 

j7772+.to function fopen, which is called. 
call _fopen 

777 Pop the arguments 
add esp, 8 


;77 Save the reurn value on the stack 
77; When compiled with debugging on every time a local variable is 





changed, 
;77 1s stored on the stack 
mov DWORD PTR _fS$[ebp], eax 
; 19 : if (£ == NULL) 
cmp DWORD PTR _f$[ebp], 0 
jne SHORT $L367 
720 : eprintf("can't open file: %s", argv[1]); 
777 Push argv[1] (the second argument) 
mov ecx, DWORD PTR _argv$[ebp] 
mov edx, DWORD PTR [ecx+4] 





push edx 


77, And push the static string as the first argument 
push OFFSET 
FLAT: ??_C@_OBE@OCM@can?8t?50pen?5file?3?5?SCFs?SAA@ ; “string' 








call _eprintft 
777 Pop arguments 
add esp, 8 


777, Jump to this label if (f != NULL) 


SL367: 
7 2A : fclose(f); 


777 Move local variable f into eax, push it, call fclose, and pop the 
777 argument. 


mov eax, DWORD PTR _fS$[ebp] 
push eax 
call _fclose 
add esp, 4 
P22 : 
7423 : printf ("File opened and closed without errors\n"); 


push OFFSET 
FLAT: ??_C@_OCH@BNAK@File?5o0pened?5and?5closed?5without?5e@ ; “string' 





call _printt 
add esp, 4 
7; 24 : 
AD : return 0; 


77; Return value is stored in eax 
777 XOY Cax, eaxc, is a fast way to store zero in eax 
xor eax, eax 


777 Epilouge code 





77; Restore saved calle save registers 





pop edi 
pop esi 
pop ebx 
777 Pop the stack frame 
add esp, 68 ; 00000044H 


j77 I think this is a test to check that the old stack frame is 
777 restored 

cmp ebp, esp 

call __chkesp 


777 Restore the old stack frame. 


mov esp, ebp 

777 Pop old stack frame pointer 
pop ebp 

77, Return without poping any registers. 
ret 0 


End of procedure main 
main ENDP 
End of code segment 

T ENDS 
nd of the source code 














= 
i 
Ax 











5 Examples 


This chapter contains a lot of examples. You should read and understand them all. 


5.1 Arithmetic Instructions 


C program: 


f=(g+h)-(@+)); 











; We assume that f, g, h, i and j are assigned to registers EAX, EBX, 

































































Assembly: 

ECX, EDX and ESI 

ov EDI, EBX ; EDI =g 

add EDI, ECX ; EDI =g +h 
ov EAX, EDI ; EAX = (g + h) 
ov EDI, EDX ; EDI = i 

add EDI, ESI ; EDI =i+ j 
sub EAX, EDI ; EAX = (g + h) 





(i + 3) 


5.2 Data Transfer (mov instruction) 


.DATA 
a_letter DB 


array DD 20 DUP 

















Yo! 


(0) 





















































qwa SQWORD 25 DUP (?) 
ov EAX, EBX 
ov EAX, 132 
ov a_letter, BYTE PTR EAX 
ov EAX, [ESP] 
ov ECX, OFFSET array 
ov ECX, array 
ov EAX, array[ESI*4] 
ov EAX, [EBX+EST] 
ov EAX, [EBX+ESI*4+2] 
ov ECX, OFFSET array 
ov [ECX], SQWORD PTR 25 











Allocate one byte of memory, initialize 
LE EO. TO. 

Array of 20 integers initialized to zero 
Array of 25 quadwords (64 bits), 
uninitialized 














FAX = EBX 
EAX = 132 

emory[a_letter] = AL (8 lsb of EAX) 
EAX = memory [ESP] 





ECX = &(array[0]) 
ECX = array[0] 

















BAX = EAX + memory[ OFFSET table + 
ESI * 4] 

EAX = memory[EBX + (ESI * 1) + 0] 

EAX = memory[EBX + (ESI * 4) + 2] 








ECX = &(array[0]) 
emory[ECX] = array[0] = 25 











2) 


5.2 Jumps 


5.2.1 Unconditional Jump 


Infinite loop: 


forever: 


jmp forever 


5.2.2 Conditional Jumps 


If then else 


if (a < 0) { 
b= 
} 


= os 


else if (a > 0) { 


} 


else { 


} 


; Assume that: 


cmp 

jge 
smaller: 

mov 

jmp 
larger: 

cmp 

jle 

mov 

jmp 
equal: 

mov | 
exit_if: 
Loop 














3; 
0; 
a is in EAX, and that b is assigned to EBX 

EAX, 0 

larger ; if (a >= 0) goto larger; 
PaO 

EBX, —5 i; b= -5 

exit_if 

EAX, 0 

equal ; if (a == 0) goto (we know that a >= 0, 
: cannot be < 0) 

EBX, 5 7 b= 5 

exit_if 

EBX, O ; b= 0 





int i, vector[25]; 


for (i = 0; 
vector [i] 


i < 25; 
= 0; 


i++) 





End of if then else 


so it 
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Assembly: 


; vector[] is allocated memory on the stack 

_vector = -112 ; Start address of vector is EBP - 112 

; Remember that the stack grows downward, and that local 
; variables are below the frame pointer. 

; You should also note that: 











; vector[0] = memory[ebp + _vector] 
; vector[13] = memory[ebp + _vector + 13*4] 
start_loop: 

mov ECX, 0 7; i = 0 





jmp init_loop 


loop: 
add ECX, 1 7 itt 


init_loop: 
cmp ECX, 25 
jJge exit_loop ; if (i >= 25) goto exit_loop 











body: 
; memory[ebp + _vector + ecx * 4] = 0 
; DWORD PTR because we want to move a double word (remember 
; that vector is an array of int's) 
mov DWORD PTR _vector[ebp + ecx * 4], 0 
end_body: 


jmp loop 


exit_loop: 


5.3 Function calls 


C code: 


void *emalloc(size_t size) 


{ 


void *rp; 


if (((rp = malloc(size))) == NULL) { 
printf ("Malloc error\n"); 
exit(1); 


return rp; 


Assembly: 


PUBLIC _emalloc 


.DATA 


malloc_string DB 'Malloc error',13,0 ; Allocate memory for a 
7 null terminated string 





_sizeS = 8 ; memorylebp + _size$] = argument 

_rp$ = -4 ; memorylebp + _rp$] = local variable 

_—emalloc: 
push ebp ; Save old stack pointer 
mov ebp, esp ; Create a new stack frame 
sub esp, 4 ; Allocate memory for local 


; variabels 








mov eax, _sizeS[ebp] ; Move argument 'size' to EAX 
push eax ; Push malloc() argument 
call _malloc ; Call malloc 
add esp, 4 ; Pop arguments 
mov _rp$[ebp], eax ; Save return value on the stack 
cmp eax, 0 ; Return value is in EAX (is 
7 compared to 0 = NULL) 
jne no_error ; if (return value != NULL) goto 
; no_error 


error: 
push OFFSET malloc_string ; First argument to printf() is 
; the address of the string 





call _printf ; Call printf 

add esp, 4 ; Pop argument 

push 1 ; First argument to exit () 
call exit ; Call exit 


; Note that we never return from exit () 


no_error: 


mov eax, _rpS[ebp] ; Move return value into eax 
mov esp, ebp ; Restore stack pointer 

pop ebp ; Restore old stack frame 
ret ; Return to caller 


5.4 The most useful IA-32 Instructions 


Category Instruction Example Meaning 
subtract sub EAX, EBX EAX = EAX - EBX 
add immediate fd EAX EBX EAX, 200 EAX = EAX + EAX=EAX+EBX” 


add ladd unsigned) ladd EAX,EBX EAX, EBX IEAX = EAX + EBX > EAX + EBX 


| add immediate 
Don't exist 
unsigned 
ok 
| ee imul EBX Se se 


pitt BCX, EBX ECX, EBX IECX = ECX * EBX | ECX IECX = ECX * EBX | EBX 


imul ECX, EBX, 200 |ECX = EBX * 200 
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pil ECX, 200 ECX, 200 IECX = ECX * 200 ECX IECX = ECX * 200 200 


EDX:EAX = EAX * 
multiply unsigned mul ECX ECX 

EAX = EDX:EAX / 
divide idiv ECX ECX, EDX = 

EDX:EAX % ECX 

EAX = EDX:EAX / 

ivi div ECX ECX, EDX = EDX: 
EAX % ECX 


and EAX, EBX EAX = EAX & EBX 
| or EAX, EBX EAX = EAX | EBX 





























Logical shift left logical shl EAX, EBX esis 
EBX 
oe ; EAX = EAX >> 
| shift right logical shr EAX, EBX EBX 
xor xor EAX, EBX EAX = EAX * EBX 
——___ Can EAX, EBX a EBX 
| p mov EAX,200 = [EAX=200 
EAX = 
| mov EAX, [ESP] ryiESP| 
EAX = 
| mov EAX, label niemory[label| 
| | mov EAX, OFFSET E AX = &(array[0)) 
array 
AX = memory[ 
Data transfer mov EAX, 
3 
table[ESI*4] ane table + ESI 


P mov E AX, E AX = memory[E 
[EBX+ESI*4+2]; |BX + (ESI * 4) + 2] 
ESP = ESP - 4; 
push push EAX memory[ESP] = 
EAX 
pop EAX 


EAX - EBX. Set 
Compare compare cmp EAX, EBX control flags. 
me ; . ; ‘ Jump to label if Zero 
l 4 
Conditional jumps ve if equal ie label Flag (ZF) is set 
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| jumpitzero label oo to label if ZF = 

| jump if not equal* jnelabel a to label if ZF = 

| jump itmot zero! nzlabel a to label if ZF = 

| jump if CX is zero jowlabel i to label if CX = 
jump if carr ie Jump to label if 

= d Carry Flag (CF) is set 

| jumpitnotcary je Jump to label if CF = 

ump if not carry nc 





Jump to label if 
jump if overflow jo label Overflow Flag (OF) 
is set 


ee to label if OF = 
H 


Jump to label if Sign 
jump is sign js label Flag (SF) is set 

(negative sign) 

Jump to label if SF = 
jump is not sign jns label 0 bigetve tive sign) 


jump = ~—_— imp jmplabel Jump to Jump tolabel 


Unconditional jump 

a mp EAX aye to address in 
Instruction call call. all funtion label PUSH EIP and jump 
oe a a ro 


increment inc EAX EAX = EAX + 1 
| —ao FAK EAX EAX = EAX - | 


| no Ino operation nop ——_ [Do [Do nothing = 


Other 
tees test EAX, EBX EAX & EBX. Set 
control flags. 
tmp = EAX; EAX = 
| exchange values xchg EAX, EBX EBX; EBX = tmp 


! The processor don't care if it is a signed value, it evaluates the result for both values. 





jump if not overlow ino label 
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2 Note that IA-32 is not a load-store architecture; most of the instructions can have one of 
the operands in memory. 


3 Only ESI and EDI can be used as the displacement register. 


4 Use cmp or test to set control flags (ZF, CF, OF, SF, PF). 
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